Prostate cancer reshapes the secreted and extracellular vesicle urinary proteomes

Urine is a complex biofluid that reflects both overall physiologic state and the state of the genitourinary tissues through which it passes. It contains both secreted proteins and proteins encapsulated in tissue-derived extracellular vesicles (EVs). To understand the population variability and clinical utility of urine, we quantified the secreted and EV proteomes from 190 men, including a subset with prostate cancer. We demonstrate that a simple protocol enriches prostatic proteins in urine. Secreted and EV proteins arise from different subcellular compartments. Urinary EVs are faithful surrogates of tissue proteomes, but secreted proteins in urine or cell line EVs are not. The urinary proteome is longitudinally stable over several years. It can accurately and non-invasively distinguish malignant from benign prostatic lesions and can risk-stratify prostate tumors. This resource quantifies the complexity of the urinary proteome and reveals the synergistic value of secreted and EV proteomes for translational and biomarker studies.

Proteins enter the urine in two ways: leakage at the glomeruli of the kidney and throughout the urogenital tract.It is believed that the vast majority of the excreted urinary proteome derives from tissues of the genitourinary tract, rather than leakage from the kidneys.Urogenitary proteins can enter the urine via passive release through cell death, through active translocation, and as part of secreted extracellular vesicles (EVs) 11,12 .EVs are nanosized particles with a lipid bilayer released by cells into the extracellular milieu.They vary dramatically in size, ranging from 30-2000 nm in diameter, and are heterogeneous in their mechanisms of biogenesis, molecular composition, and function 13 .EVs play a crucial role in both physiology and in the pathogenesis of diseases, including cancer 14 .EV and secreted proteomes are hypothesized to be context-driven and tissue-specific 15 , but their presence, population variability, and disease relevance in urine remain poorly characterized.
To fill this gap, we generated comprehensive urinary proteomic profiles from 190 treatment-naïve men with a range of benign and malignant conditions.In this work, we demonstrate a simple protocol that uses urine to directly sample prostate proteins.This allows us to identify the tissue and subcellular origins of urinary proteins and EVs, and to quantify how the urine proteome changes over time in specific individuals.Urinary EVs, but not those released from prostate cancer cell lines nor secreted urinary proteins, accurately reflect prostatic tissue.Prostate tumor-specific urinary proteins accurately distinguish men with and without prostate cancer and risk-stratify those already with the disease.Canonical EV markers are not effective in urine, but we identify context-dependent urine EV cargo that accurately marks specific urinary EV populations.

Digital rectal examination enriches for prostate proteins
The urine proteome is believed to derive almost exclusively from tissues of the genitourinary tract: the kidney, bladder, and (in men) prostate 11 .Perturbation of the prostate gland using digital rectal examination (DRE) enriches for prostate-specific RNAs in urine 16 .While the mechanism is unknown, it is thought to occur by expelling prostatic fluid into the urethra, where it can be collected as part of firstcatch urine.DREs are routine, minimally-invasive physical manipulations performed millions of times annually by oncologists and primary care physicians and thus might provide a simple approach to enrich prostate-derived proteins in urine.
We therefore collected matched pre-and post-DRE urines from ten men (Fig. 1a and Supplementary Data 1) and applied differential ultracentrifugation to separate urine-soluble proteins (uSP) from urinary extracellular vesicles (uEVs).We further isolated two uEV populations based on size 17 : one at 20,000 × g (termed uEV-P20) and the other at 150,000 × g (uEV-P150; Supplementary Fig. 1a).To determine if a DRE influenced the biophysical characteristics of EVs, we quantified EV diameter, number, and morphology by nanoparticle tracking and by transmission electron microscopy (Fig. 1b and Supplementary Fig. 1b-d).uEV biophysical characteristics were unchanged before or after a DRE (Supplementary Fig. 1b-d).
To evaluate if a DRE increased the abundance of prostate-derived proteins, we measured the proteomes of each urine fraction using mass spectrometry (Supplementary Data 2).Prostate tissue-derived proteins were curated from three independent tissue proteomics datasets [18][19][20] , then annotated in urine.While the total number of urine proteins was unchanged by a DRE (4064 ± 604 pre-DRE vs. 4362 ± 511 post-DRE; P = 0.31; two-sided Wilcoxon signed-rank test), prostate tissue-derived proteins were significantly more abundant in post-DRE urine (P < 2.2 × 10 −16 ; Fig. 1c-e).These included classic prostate marker proteins like PSMA (FOLH1) and PSA (KLK3) 20 (Fig. 1f and Supplementary Fig. 1e).Thus, a DRE significantly enriches the urine for proteins of prostate origin but does not influence EV biophysical characteristics, suggesting the latter may be relatively tissueindependent.

The subcellular origins of urinary proteins
To investigate urinary protein heterogeneity, we next collected post-DRE urines from 190 men: 64 men with no cancer diagnosis and 126 with untreated prostate cancers.The prostate cancer patients reflected the full risk spectrum of primary disease, with biopsy ISUP Grade Groups (GG) ranging from low (GG 1) to high (GG 5; Fig. 2a).We isolated uSP, uEV-P20 and uEV-P150 from post-DRE urine and quantified their proteomes (Fig. 2b; Supplementary Data 1).uEV biophysics and urine protein counts were largely independent of disease status, age or serum PSA levels (Supplementary Fig. 2a-g).We observed an increasing trend in uEV particle count in cISUP GG 1-3 followed by a decrease in particle count in cISUP GG 4-5, although this finding requires further validation in larger cohorts due to the low numbers of patients in each group (Supplementary Fig. 2c).The uEV-P20 and uEV-P150 fractions were biophysically similar in size morphology and particle count (Fig. 2c-e).
Despite these biophysical similarities, the proteomes of different urine fractions were significantly different.The uEV-P20 fraction had the most detectable proteins (Fig. 2f) and was the most biophysically and proteomically diverse, suggesting heterogeneity in vesicular type or origin (Fig. 2g).uEV isolation was highly reproducible, with larger proteomic differences observed between patients and clinical groups compared to within samples (Supplementary Fig. 3a, b).Of the 6518 proteins detected in uEVs, 60% were identified as EV cargo in previous studies (Supplementary Fig. 3c, d) 17,[21][22][23][24] .uEV fractions were more similar to one another than to the soluble protein fraction, consistent with the detected EV proteins being true cargo rather than co-isolated urinary proteins (Supplementary Fig. 3e-g).To determine the differential origins and biology represented by each fraction, we performed differential proteome analysis.We identified proteins detected in one fraction but not the others (Fig. 2h) and proteins detected at different abundances across fractions (Fig. 2i).Each fraction was defined by presence of ~50 proteins and differential abundance of ~500 others (Fig. 2j).Fraction-specific proteins tended to arise from specific subcellular compartments (Supplementary Data 3).Urine-soluble proteins were typically secreted or derived from the Golgi apparatus, while uEV-P20 proteins derived from mitochondria or endoplasmic reticulum and uEV-P150 proteins from the plasma membrane (Fig. 2k).

uEVs but not cEVs reflect the prostate tissue proteome
We next quantified how well each of the three urine fraction proteomes reflects the proteome of prostate tissue 18,19 .A majority (67%) of all proteins detected in prostate tissue were detected in one or more of the three urine fractions (Fig. 3a).EV fractions were a much richer source of prostate-derived proteins than urine-soluble proteins.Only 116 prostate-derived proteins were identified in unfractionated urine, whereas 2439 were identified only in one or both of EV fractions and 4968 in both EV and non-EV urine.Protein abundances were wellcorrelated between urinary and tissue proteomes (Fig. 3b), with uEV-P20 being the best surrogate for prostate tissue.
To further quantify the tissue provenance of urinary proteins, we used RNA-seq data from normal tissues (GTEx 25 ) and from normal tissue adjacent to tumors (NAT, TCGA [26][27][28] ) to identify transcripts enriched in prostate, kidney or bladder (Fig. 3c).Differentially abundant transcripts were then used as signatures of tissue origin to quantify the contribution of each tissue to each sample.In uEVs, prostate proteins were very significantly more abundant than nonprostate proteins (P < 1 × 10 −5 ).In contrast, the soluble urine fraction (uSP) showed the inverse trend: it was significantly depleted in prostate-derived proteins (P < 2.2 × 10 −16 ; Fig. 3d).The soluble protein fraction (uSP) was enriched in functions classically associated with blood (lipoprotein, blood microparticle; Fig. 3e).The soluble proteome was also highly enriched in cell surface proteins, likely from the shedding of extracellular domains 29,30 .Proteins involved in multivesicular body biogenesis were over-represented in uEV-P150, suggesting an enrichment of exosomes 31 .Consistent with the univariate protein analysis, uEV-P20 fraction more closely reflected prostate tissue, harboring proteins originating from mitochondria, ribosome, and extracellular matrix 18 .These data suggest that the prostate primarily sheds EVs into urine and that these can provide a robust noninvasive proxy of its proteome.
Cell line conditioned media has been widely used to study secreted cellular components 32,33 .To ascertain if cell line-derived EVs (cEVs) accurately reflect tumor tissue, we isolated and proteomically  characterized cEVs from five prostate cell lines (Fig. 4a).EVs from urine and from cell line conditioned media displayed similar biophysical characteristics (Supplementary Fig. 4a-d).More proteins were detected in cEV-P20 than in cEV-P150, but fewer than in whole-cell lysates (Supplementary Fig. 4e, f).Cell line whole-cell lysates closely resembled tumor tissue in both protein composition (Fig. 4b) and abundance (Fig. 4c).By contrast uEVs and cEVs differed, particularly in abundance (Fig. 4d, e).Urinary EVs were better surrogates for prostate tissue than EVs derived from cell lines (Supplementary Fig. 4g).Commonly used EV markers like CD9, CD81 and CD63 34,35 were amongst the proteins more abundant in EV-P150 than EV-P20 in both cell lines and urines (Fig. 4f and Supplementary Fig. 4h), while other EV markers like FLOT1 were discordant between cell lines and urine (Fig. 4f).PSA protein expression in whole-cell lysates was consistent with the androgen sensitivity of the cell lines, with androgen insensitive cell lines DU145, PC3 and RWPE1 not expressing PSA and androgen-sensitive cell lines 22Rv1 and LNCaP expressing PSA (Supplementary Data 4) 36 .In cEVs, only EVs isolated from LNCaP had detectable PSA.Mitochondrial proteins were over-represented in cEV-P20 and plasma membrane proteins in cEV-P150 (Fig. 4g).Thus, post-DRE EV-associated proteins more accurately reflect prostate tissue than prostate cancer cell line EVs or, particularly, post-DRE urine-soluble proteins.

Biomarker potential of urinary proteomes
For a biomarker to be useful, it needs to be robust to a variety of types of errors.While urine is not prone to spatial biases, it is unclear to what extent an individual's urine proteome is stable over time.To quantify the temporal stability of post-DRE urine, we evaluated longitudinal samples in five prostate cancer patients over several years.From each patient, we collected post-DRE urine at multiple time points; all patients were managed by active surveillance without any indication of clinical progression (Fig. 5a; Supplementary Data 1).We used variance analysis to quantify which proteins were more variable between samples of a single patient (intra) or across individuals (inter).On average, proteomes were more similar within patients than between them, for both EV cargo and urine-soluble proteins (Fig. 5b).To identify proteins that might be particularly useful as biomarkers, we identified those that were longitudinally stable using the intraclass correlation coefficient (ICC) 37 .The higher a protein's ICC, the less its variance in protein abundance is caused by random fluctuations over time.A subset of proteins was highly longitudinally stable in each fraction (Supplementary Fig. 5a), and these comprise excellent candidate biomarkers because they are robust to physiological variability over several years.Next, we sought to evaluate the biomarker potential of urinesoluble and uEV proteins for predicting prostatic disease.Men with benign prostatic conditions (non-cancer [NC]) included individuals with elevated serum PSA levels and benign prostatic hyperplasia as well as patients with no diagnosed prostate cancer on transrectal ultrasound-guided 12-core biopsy (Supplementary Data 1).Patients with and without prostate cancer had similar serum PSA abundances (P = 0.34).Despite this similarity, thousands of soluble (Fig. 5c) and EV proteins (Fig. 5d, e) differed between patients with and without cancer.The specific differentially abundant proteins varied between urinary fractions (Supplementary Fig. 5b).Of these proteins that were differentially abundant in prostate cancer and non-cancer uEVs, 541 proteins were uniquely detected in our dataset compared to other datasets of post-DRE urine-derived uEVs from patients with prostatic disease 17,21 (Supplementary Data 4).In addition, only 21 proteins were unique to a single disease subgroup (i.e.non-cancers or specific cISUP GG).These results suggest that differences in disease groups are reflected in the differential abundance of proteins in the uEV proteome.
Pathways associated with malignancy in prostate tissue also differed from those in urine (Supplementary Fig. 5c).For example, proteins regulated by androgen response, such as PSA, had increased abundance in tumor tissue compared to NAT tissue, but had the opposite trend in urine-soluble proteins and uEVs.Consistent with other studies 38,39 , PSA (KLK3) protein abundance was consistently significantly reduced in all fractions of urine from prostate cancer patients relative to men without a cancer diagnosis, despite being increased in serum and in tumor regions 19 (Fig. 5c-e).Therefore, our data suggest that pathways associated with malignancy are reflected in different subsets of the prostate urinary proteome.
To create biomarkers of prostatic disease, we focused on proteins that were frequently detected (in >50% of samples), enriched in a urinary fraction, prostate-derived, and longitudinally stable (ICC > 0.4) (Fig. 6a).This filtering strategy retained 226 uSP proteins, 280 uEV-P20 proteins, and 235 uEV-P150 proteins.We used statistical machine learning to create and validate classifiers independently for each urine fraction.First, we created classifiers that distinguish cancer from noncancer based solely on urine proteins; these had AUCs ranging from 0.92-1, significantly outperforming serum PSA (Fig. 6b).In an independent, prospective validation cohort of 30 patients with proteomic data from all three urinary fractions (Supplementary Data 1), proteins had concordant effect sizes and validation AUCs ranging from 0.71-0.81(Fig. 6c, d).Next, we applied the same methodology to the more challenging task of distinguishing low-grade from high-grade cancer.This is an important clinical question, as low-grade disease is typically managed by active surveillance, and higher-grade disease by definitive local therapy.Urine distinguished low-from high-grade prostate cancer with AUCs ranging from 0.73-0.79(Supplementary Fig. 6a), again matching or exceeding serum PSA.In two independent validation cohorts of 199 and 75 patients (Supplementary Data 1), effect sizes of uSP proteins between the discovery and validation cohorts were largely concordant (Supplementary Fig. 6b).Intriguingly, signatures of grade also informed on disease status, reflecting an overlap in determinants of initiation and progression, as seen in studies of prostate cancer genetic drivers 40 (Supplementary Fig. 6c).Proteins within these signatures exhibit longitudinal stability and were univariately associated with disease status (Supplementary Fig. 6d).These data suggest that the urine proteome is an untapped source of biomarkers for genitourinary disease.

Markers of prostate tumor uEVs
Post-DRE urine can non-invasively sample the prostate tissue proteome and has significant biomarker potential.However, the soluble fraction and the protein cargo of different-sized EVs differ in their origin and association with disease phenotypes.As examples, SPOCK1 protein was significantly lower in the soluble fraction of urine from cancer patients but not in uEVs, while PCYOX1 shows the inverse (Supplementary Fig. 6d).This interplay between subcellular origin, clinical phenotype, and urine fraction is summarized in Fig. 7a and suggests that protein cargo is selectively packaged into EVs in processes that are dysregulated during malignant transformation (Supplementary Fig. 7a).
This heterogeneity in origins and association with clinicoepidemiologic characteristics highlights the importance of g Fraction of uEV-P20 and uEV-P150-unique proteins from 124 patients with matched uEV fractions.h Number of samples each protein was detected in (n = 96 patients).For each pairwise comparison, the numbers of proteins present in >90% of samples in one sample type and <10% of the other are labeled on top of each panel.i Differences in shared protein abundance between fractions.Significant differences (FDR < 0.05, two-sided Wilcoxon signed-rank test) are in green (uEV-P20), pink (uEV-P150) or yellow (uSP).n.s.: non-significant.Total differentially abundant proteins in bottom corners.n = 96 patients.j Fraction-enriched proteins either unique to one fraction or differentially abundant in one fraction relative to the other two.k Odds ratio (OR) of gene set enrichment for each subcellular localization 62 for proteins from (j).Grey background shading indicates FDR < 0.05 (Fisher's exact test).Boxplots are shown with the line indicating the sample median, the box indicating the 25th and 75th percentiles, and the whiskers indicating the ±1.5 × interquartile range (IQR).Source data are provided as a Source Data file.
accurately isolating specific EV populations.The gold-standard intensive centrifugation used here is expensive and time-consuming, so affinity-based isolation methods are strongly preferred for rapid translational and clinical studies.Affinity methods require the identification of proteins characteristic of specific EV subpopulations 41 .We evaluated the performance of 13 EV markers that are ubiquitous to multiple human cancer tissues and fluids but not yet evaluated in prostate urinary EVs 41 .Four of these were elevated in uEV-P150 relative to soluble urine proteins and five were elevated in uEV-P20.Others were depleted in uEVs, and none showed the large effect sizes or small  Spearman's rank correlation and its P-value (two-tailed) are shown.c Analysis strategy to identify tissue-associated genes in RNA-seq of normal or normal adjacent to tumor (NAT) prostate, bladder, and kidney from the Genotype-Tissue Expression project (GTEx) 25 or The Cancer Genome Atlas (TCGA) [26][27][28]  inter-sample variability needed for ideal affinity-based markers (Fig. 7b).Thus, canonical EV protein markers do not appear to be optimal for uEV identification and isolation.We therefore sought to identify protein markers to distinguish prostate-derived uEVs.We selected proteins that were both distinct to one urine fraction (Fig. 2j) and were known to be present in prostate tissue.We separated these into three protein subsets: those associated with presence of a tumor, those associated with tumor grade and disease-invariant "core" proteins (Fig. 7c).These subsets were functionally distinct, with prostate-specific secreted proteins such as KLK3 and ACPP (Cluster C uSP 8) being specific to the urine-soluble protein fraction but not uEVs, and GTP binding proteins specific to the uEV-P150 fraction (Cluster C P150 8) 15 (Supplementary Fig. 7b-d).
Finally, to identify specific actionable markers for uEV affinity studies, we selected predicted cell surface proteins from each subset 42 (Fig. 7c and Supplementary Fig. 7e, f).The resulting five uEV-P20 and ten uEV-P150 tumor markers that differed in patients with cancer and benign disease include the classical EV marker CD63 34,35 (Fig. 7d).Grade markers (Supplementary Fig. 7e) contain markers such as ITGB2 and SLC4A1 in the uEV-P20 fraction.Disease-invariant core markers of uEV-P150 include classical EV markers CD9 and CD81, which were frequently detected in uEV-P150 (Supplementary Fig. 7f).These data indicate that the EV proteome is context-specific (Supplementary Fig. 7g), and that protocols for rapid, specific isolation of EV subpopulations may differ from those useful in plasma or some other tissues.

Discussion
Urine contains a complex mixture of proteins that differ in their form of release and tissue of origin, resulting in a dynamic range of concentrations spanning ten orders of magnitude 4 .To better define the urinary proteome, we used fractionation to distinguish urine-soluble proteins from proteins carried in urinary extracellular vesicles (uEVs).uEVs of different sizes and densities contained proteins from different subcellular origins, suggesting distinct biogenesis 43,44 .uEVs, particularly the uEV-P20 population, appear to derive heavily from prostate tissue, and to accurately reflect its proteome.By contrast, neither the soluble urine proteome nor cell line-derived EVs did so.This highlights the need to prioritize patient-derived EVs in translational studies.This is particularly key for diseases where models that faithfully recapitulate aspects of the natural history of cancer are lacking, such as hormone sensitivity and hypoxia in prostate cancer.
Prior work has rigorously quantified the role of factors such as sample collection, processing 45,46 , and storage 47 on urine proteomes.We show that digital rectal examinations are a simple way to enrich for prostate proteins and EVs in first-catch urine.While post-DRE urine proteomes can vary over time 4,48 , a subset of specific proteins are temporally stable over many months and are well suited for noninvasive sampling.Our analysis of temporal stability of the urine proteome was restricted to prostate cancer patients that all have cISUP GG 1 tumors, similar serum PSA levels, and who did not have clinical upgrading over a period of 1-2 years.Follow-up studies utilizing larger, clinically heterogeneous cohorts will be needed to verify if the temporally stable proteins identified in low-grade patients generalize to other patients with other tumor grades.We found that the protein composition of urinary proteomes uniquely differs as a function of tumor grade, and when comparing prostate cancers to non-cancers.Consistent with data from other groups, prostate proteins such as PSA were elevated in the urine and urinary EVs of patients without cancer compared to those with prostate cancer, while serum PSA levels display the opposite trend 38,39,49 .This is likely due to the disruption of prostate cellular architecture with increasing tumor grade, which results in elevated levels of PSA in the bloodstream and consequently decreased levels of PSA in urine 38 .These differences in the secreted proteome between groups likely reflect altered signaling and intercellular communication by cancer cells or as a compensatory mechanism for altered tumor metabolism or cellular stress 50,51 .Therefore, we show that EV cargo is context-dependent and is associated with different cellular phenotypes, such as disease.
Further proteomic interrogation of urinary EVs in individuals of different ancestries, ages, and in different clinical scenarios 52 could offer new insights into genitourinary biology.However, the complexity of EV isolation by differential ultracentrifugation and the requirement for high starting material can be limiting 17 .Affinity-based capture of specific EV subpopulations using cell surface markers is appealing 53 .Intriguingly, uEVs appears to be characterized by cell surface markers different from other EV subpopulations 41 .Thus new protocols to rapidly isolate EVs from urine are urgently needed to maximize its utility for biomarker and other translational studies.In this study, we used differential ultracentrifugation, which is a less specific EV isolation approach compared to affinity-based capture or differential gradient ultracentrifugation which can isolate EV populations with higher specificity 54 .As such, the selectivity, protein composition, and protein abundance in EVs may differ depending on the choice of the EV isolation method.Future experiments will be needed to verify if these putative markers are able to capture similar EV subpopulations.
Since our unique dataset is one of the largest urine proteomics studies, we were able to leverage urinary fraction-specificity, frequency of detection in prostate tissues, and longitudinal stability to prioritize proteins as new biomarker candidates for prostatic diseases.The proteomics dataset generated in this study comprises one of the larger-sized discovery cohorts for urinary proteomics studies totaling >100 patients for prostate cancer biomarker discovery in urine, even for urinary EVs, and is well-powered to detect log 2 fold changes in protein abundances of 1.However, while many of the proteomic differences between groups (e.g., cISUP GG) are concordant in independent cohorts, these differences are small.Further validation of the urinary signatures using robust quantitation by targeted proteomics assays with stable isotope labeled standards 52,55 in larger, racially diverse validation cohorts is needed.The full data resource presented here is accessible via an interactive portal (http://kislingerlab.uhnres.utoronto.ca/ev/home/)to facilitate investigation into the urinary secreted and EV proteomes.

Urine collection
The first 15 mL of first-catch urine collected post-digital rectal exam (DRE) (post-DRE urine) was collected by performing a gentle massage of the prostate gland during DRE prior to biopsy 2 .For the DRE cohort (Supplementary Data 1), which comprised of ten men with clinical ISUP GG 1 tumors, mid-stream urine was collected an hour before the DRE massage (pre-DRE urine).Matched post-DRE urine was also collected for these ten men.The longitudinal cohort comprised of five men with cISUP GG 1 tumors who are on active surveillance and did not upgrade in the period of 12-16 months after their first DRE (Supplementary Data 1).Serial post-DRE urine was collected for each patient at threetime points.Each time point was 3-12 months apart.For assessing the reproducibility of uEV isolation in prostate cancer patients, the first 50 mL of first-catch post-DRE urine was collected from three men with cISUP GG 1 tumors.For assessing the reproducibility of uEV isolation in men without prostate cancer, we pooled post-DRE urines from 10 men with benign prostatic hyperplasia and pooled post-DRE urines from 10 men with elevated serum PSA but no prostate cancer detected on needle biopsy.Pre-and post-DRE urine was centrifuged at 2000 × g for 15 min at 4 °C to pellet cellular debris, and the resulting urine supernatant was stored at −80 °C.with 100 mL of media or ten 15 cm plates (total area = 1480 cm 2 ) in 20 mL of media each and cultured in a 37 °C incubator with 5% CO 2 .RPMI media (Gibco) supplemented with 10% fetal bovine serum and 1% penicillin-streptomycin-glutamine (PSG) was used for the prostate cancer cell lines (DU145, PC3, 22Rv1 and LNCaP) and Keratinocyteserum free media supplemented with 0.05 mg mL −1 bovine pituitary extract, 5 ng mL −1 epidermal growth factor, and 1% Penicillin-Streptomycin-Glutamine was used for the RWPE1 cell line.

EV isolation from urine
Urinary extracellular vesicles (uEV) were isolated by differential ultracentrifugation 17 .Briefly, 14 mL of frozen urine supernatant was thawed at 4 °C, then diluted to a volume of 35 mL with isotonic buffer (250 mM sucrose, 10 mM HEPES, 1 mM EDTA, pH 7.4).The urine was centrifuged at 20,000 × g for 30 min at 4 °C (k-factor 1790) in an Optima XPN-80 ultracentrifuge (Beckman Coulter) equipped with a SW32Ti swinging bucket rotor (R min 67, R max 153, Beckman Coulter) to pellet EVs.The 20,000 ⨯ g pellet (P20) was treated with 500 mM of dithiothreitol (DTT) at 37 °C for 30 min to reduce the uromodulin network and centrifuged a second time at 20,000 × g for 30 min at room temperature.The P20 pellet was resuspended in 1 mL of cold PBS and centrifuged at 18,210 × g for 30 min (Eppendorf Centrifuge 5430 R, FA-45-48-11 rotor, k-factor 198).The supernatant from the first and second centrifugation steps were combined and centrifuged at 150,000 × g for 2 h at 4 °C (SW32Ti swinging bucket rotor, k-factor 239) in an ultracentrifuge to pellet EVs.The 150,000 × g pellet (P150) was resuspended in a high pH buffer and then passed twice through a 0.22 µm filter.Samples were centrifuged again at 150,000 × g for 2 h at 4 °C to pellet uEV-P150.The P20 and P150 pellets containing uEV-P20 and uEV-P150, respectively, were resuspended in 100 µL of 50% 2,2,2trifluoroethanol (Sigma-Aldrich) in PBS, flash-frozen in liquid nitrogen, and stored at −80 °C until proteomics analysis.

EV isolation from cell culture-conditioned media
All cell lines were grown to 70-80% confluency then washed three times with phosphate-buffered saline (PBS) and serum-starved for 48 h prior collection of conditioned media.The cell line conditioned media containing EVs (cEV) was collected and centrifuged at 500 × g for 10 min then 2000 × g at 4 °C for 30 min to clear cell debris.The supernatant was concentrated to a volume of 4-5 mL (if using EVs for biophysical studies) or to a volume of 20-30 mL (if using EVs for proteomics) in a 100 kDa MWCO ultrafiltration concentrator (Millipore).EVs were isolated from conditioned media by differential ultracentrifugation in a SW32Ti swinging bucket rotor as described above.Conditioned media was topped off with PBS as required.Unlike the EV isolation protocol for urine described above, the first P20 pellet was not treated with DTT as cell line conditioned media is not expected to contain uromodulin.cEVs were collected from the 20,000 × g (cEV-P20) and 150,000 × g (cEV-P150) pellets.

Urine proteomics
Proteomic profiles of the soluble protein fraction were generated from 250 µL of urine supernatant (following 2000 × g centrifugation).Urine was prepared for proteomics using the MStern protocol 56 .For each sample, 2 pmol of Saccharomyces cerevisiae invertase was added as a sample preparation control.Proteins in each sample were reduced with 5 mM DTT and incubated for 30 min at 60 °C.To prevent reformation of disulfide bonds, 25 mM iodoacetamide was added and samples were incubated at room temperature for 30 min in the dark.The liquid in the following steps was passed through the MStern wells using vacuum suction unless otherwise stated.The polyvinylidene fluoride membrane (Millipore Sigma, MSIP4510) was equilibrated with 50 μL of 70% ethanol, then washed twice with 100 mM ammonium bicarbonate (ABC).Samples were added to the wells and passed through the membrane by vacuum suction.Each well was washed twice with 100 μL of ABC to remove salts, then proteins were digested with 1 μg of mass spectrometry grade Trypsin/Lys-C enzyme mix (Promega) in 50 μL of digestion buffer (100 mM ABC, pH 8.0, 1 mM CaCl 2 , 5% acetonitrile).To ensure that the proteins are in contact with the digestion buffer, the digestion buffer was passed through the membrane by centrifugation, and the flow-through was reapplied on top of the membrane.Protein digestion was performed at 37 °C for four hours.Samples were resuspended in the well by gentle pipetting every two hours.Peptides were collected by centrifugation, and remnant membrane-bound peptides were eluted with 50 μL of 50% acetonitrile and combined with the previous flow-through.Samples were dried in a SpeedVac vacuum concentrator (Thermo).Dried peptides were resuspended in 0.1% trifluoroacetic acid in water and desalted using homemade solid phase extraction stage tips containing 3 plugs of 3M TM Empore TM C18 membrane 57 .Peptides were quantified by NanoDrop (Thermo Scientific). 2 μg of peptides were loaded on the column.

Extracellular vesicle proteomics
EVs or whole-cell lysates in 50% 2,2,2-trifluoroethanol (Sigma-Aldrich) were lysed by freeze-thaw, then incubated at 60 °C for 1 h to extract proteins.Then, proteins were reduced with 5 mM of DTT, alkylated with 25 mM of iodoacetamide, and digested overnight at 37 °C with a 2 μg Trypsin/Lys-C enzyme mix (Promega).The next day, the enzymatic digest was quenched with 1% formic acid, and samples were desalted with homemade C18 StageTips (see above) prior to LC-MS analysis.iRT peptide standards (Biognosys) were spiked into reconstituted peptides at a 1:100 dilution according to the manufacturer's instructions.

Mass spectrometry and data processing
Peptides were separated on a 50 cm C18 reverse phase EASY-Spray LC column (Thermo ES803) with trap column (Acclaim TM PepMap TM 100 C18) interfaced with an EASY-nanoLC 1000 system over a 2 h gradient  (EVs and urine) or 4 h gradient (whole-cell lysates).Mass spectrometry was performed on a Q-Exactive HF, Orbitrap Fusion Tribrid or Orbitrap Fusion Lumos mass spectrometer coupled to an EASY-Spray ESI source (Thermo Scientific).Mass spectrometry data acquisition parameters and replication for each cohort are listed in Supplementary Data 5.All datasets were acquired in data-dependent acquisition mode.Raw files for each urine fraction and cohort were searched separately in MaxQuant 58 (v.1.5.8.3 for uSP samples and v.1.6.2.3 for uEV samples) at a single site using a UniProt human protein sequence database (complete human proteome with isoforms).For cohorts with samples processed or acquired in replicates, protein intensities were combined in MaxQuant.Searches were performed with trypsin cleavage at lysine and arginine, maximum of two missed cleavages, peptide length 7-25 amino acids, and carbamidomethylation of cysteine as a fixed modification.Variable modifications were set as oxidation of methionine and acetylation of the protein N-terminus.The false discovery rate for the target-decoy search was set to 1% for protein and peptide.Peptide detection was performed with an initial precursor and fragment mass deviation threshold of 10 and 20 parts per million respectively.Intensity-based absolute quantification (iBAQ), label-free quantitation, and match between runs (matching and alignment time windows set as 0.7 and 20 min respectively) were enabled.The peptides.txtoutput files from each MaxQuant search were parsed into an in-house database for protein grouping 59 .Protein abundances (gene-centric) were determined from peptide abundances using the iBAQ algorithm 60 (Supplementary Data 2).Reverse hits (false positives from target-decoy search) were removed, and proteins detected with two or more peptides were carried forward.Raw iBAQ intensities were normalized using median normalization.Median-normalized values were used for all analyses unless stated otherwise.All further data analysis was performed in the R statistical environment (v.4.2.1).

EV isolation for biophysical studies
EVs were isolated from urines or cell line conditioned media for nanoparticle tracking analysis (NTA) and transmission electron microscopy (TEM) as described above, with the following changes.5 mL of fluid was used for EV isolation using a SW55 Ti swinging bucket rotor (R min 61, R max 109, Beckman Coulter).To keep the k-factor consistent with the SW32Ti rotor at each centrifugation step and taking into account the time needed for the rotor to achieve its desired speed (approximately 5 min), centrifugation time were reduced to 20 min and 1 h for the 20,000 × g (k-factor 699) and 150,000 × g (k-factor 120) centrifugation steps, respectively.EV pellets were resuspended in 100-200 µL of cold, 0.22 µm filtered PBS, and stored at 4 °C for no more than 16 h prior to NTA or TEM analysis.

Transmission electron microscopy
TEM was performed at SickKids Nanoscale Biomedical Imaging Facility and Princess Margaret Cancer Centre.Samples were deposited on formvar carbon-coated grids, washed once with water, and stained twice with uranyl acetate.Images were acquired on a Tecnai 20, Hitachi HT7800, and Talos™ F200X G2 transmission electron microscope.Images were processed with ImageJ (v.1.53t)for visualization.

Nanoparticle tracking analysis
NTA for 34 men (Supplementary Data 1) with uEV-P20 and uEV-P150 was performed using a NanoSight LM10 system configured with a 405 nm laser and a high-sensitivity sCMOS camera.Camera settings were as follows: screen gain 3.0; camera level 11; 25 frames per second; slider gain 146.Each sample was diluted in particle-free PBS and introduced manually.Analysis was performed with NTA software (v.NTA for the DRE cohort (pre-DRE vs. post-DRE urine from two matched men) and cEVs (cEV-P20 and cEV-P150) was performed using a NanoSight NS300 system (Malvern) configured with a 405 nm laser and a high-sensitivity sCMOS camera.Each sample was diluted in 0.22 µm filtered PBS and introduced with a syringe pump at 60 μL min −1 .Analysis was performed with NTA software (v.3.4).For every sample, two to four technical replicates were captured.For each technical replicate, three 30 s videos were captured, with approximately 20-200 particles in the field of view for each measurement.Ambient temperature was set at 22 °C.
Raw data files ("filename-ExperimentSummary.csv") were parsed as follows for quantification and statistical analysis.Raw particle counts for each size bin were corrected for dilution factors, and then grouped to biological replicates.Each data point represents the mean of all measurements for each biological replicate (Supplementary Data 1).For cEVs, an experimental replicate is defined as cEVs isolated from the same cell line at different passages.For uEVs, a biological replicate is defined as uEVs isolated from a specific biofluid (pre-DRE or post-DRE urine) from one individual.For visualizing particle concentration vs. size distribution for each replicate (Supplementary Fig. 1d, 2c, and S2d) particle concentration was scaled with min-max [0,1] normalization with formula Eq. ( 1).
Quantification and statistical analysis Where appropriate, quantitative analyses are described in the relevant sections of the Methods.Unless stated otherwise, bioinformatic and statistical analyses and plotting were performed using R (v.

Tissue specificity in DRE urines
To determine if DRE enriches for prostate tumor-derived proteins, proteomic profiles for the soluble protein (uSP) and uEV fractions were generated from urines collected pre-and post-DRE.Paired Student's t-tests were used to identify differentially abundant proteins in preand post-DRE.To determine which proteins are anticipated to be derived from prostate tumors, proteins in this study were annotated based on detection in more than 10 tissue samples of two prostate cancer tissue proteomic datasets: Sinha et al. 18
Fraction-enriched proteins were annotated with subcellular localization information from nine main categories (Secreted, Vesicles, Plasma membrane, Mitochondria, Cytosol, Nuclear Membrane, Nucleoplasm, Nucleoli, and Golgi apparatus) from Human Protein Atlas' Subcellular location data 62 (v.22.0,proteinatlas.org)."Nuclear membrane", "Nucleoplasm" and "Nucleoli" categories were collapsed into one category called "Nucleus".Fisher's Exact Test was used to test for over-or under-representation in each category for each fraction.The magnitude of the enrichment was estimated using the odds ratio (epitools v.0.5-10.1),with the union of proteins detected in fluids (uSP, uEV-P20, and uEV-P150; 6540 proteins) used as a custom background.

Cell line proteomics data
Cells and EVs were collected from three separate passages for each cell line, termed experimental replicates, for proteomics.For each cell line and sample type, only proteins that were present in a minimum of two of three replicates were carried forward for analyses.

Identifying temporally stable proteins
We generated proteomic profiles of uSP, uEV-P20, and uEV-P150 fractions from post-DRE urine from a longitudinal cohort composed of five patients.These patients had cISUP GG 1 tumors and were on active surveillance (Supplementary Data 1).None of the men upgraded in the time that the urines were collected.For statistical analysis of the longitudinal cohort, only reproducibly detected proteins were included for analysis.For each urine fraction, we selected proteins that were detected in at least two-time points for each patient and detected in at least two patients.This resulted in a total of 1664 uSP proteins, 3365 uEV-P20 proteins, and 1990 uEV-P150 proteins.The similarity of intra-patient and inter-patient proteomes were determined using Spearman's correlation, calculated using the cor() function (use = "pairwise.complete.obs") in stats package in R (v.4.2.1).
Intra-and inter-individual variance in protein intensities was assessed using linear mixed-effects regression using the lme4 package (v.1.1-31),and the intraclass correlation coefficient (ICC) was measured, which represents the proportion of inter-individual variance relative to the total intra-and inter-individual variance explained by a model 37 .Proteins for which a model cannot be fitted due to random effect variances of close to zero.This resulted in a total of 1664 uSP proteins, 3365 uEV-P20 proteins, and 1990 uEV-P150 proteins with estimated ICC values.

Prostate cancer vs. non-cancer comparisons
Proteins detected in >50% of each sample type were considered for differential abundance analysis.This resulted in a total of 2156 uSP proteins, 3431 uEV-P20 proteins, and 2255 uEV-P150 proteins.Twosided Mann-Whitney U test was used for comparisons.For each sample type, gene set enrichment analysis (GSEA v.4.3.2) was performed on a pre-ranked list of proteins based on differential abundance (log 2 fold change) in prostate cancers and non-cancers.Enrichment analysis was performed against the human MSigDB Hallmarks gene set (v.2022.1)with gene set sizes from 25-500 and 1000 permutations.

Feature selection and machine learning
To generate predictors that distinguishes men with prostate cancers and non-cancers (NC), proteomics data from uSP, uEV-P20, and uEV-P150 fractions were trained separately.Patients: uSP NC = 39, uSP PCa = 136, uEV-P20 NC = 22, uEV-P20 PCa = 132, uEV-P150 NC = 25, uEV-P150 PCa = 131.The same cohorts were used to generate predictors that distinguishes men with cISUP GG 1 from men with cISUP GG > 1 (uSP: 50 GG 1 vs. 61 GG > 1; uEV-P20: 41 GG 1 vs. 63 GG > 1; uEV-P150: 40 GG 1 vs. 63 GG > 1).To develop predictive models, all datasets were divided into two groups: feature selection (50% of the dataset) and training (50% of the dataset).Within each urinary fraction, proteins that were detected in more than 50% of all samples and temporally stable (intraclass coefficient > 0.4 from serial post-DRE urine collected from three cISUP GG 1 patients at three-time points) were passed into feature selection.Three methods were used to select the top 2-15 features within each dataset.For each feature, log 2 FC protein abundance was calculated and the significance level was assessed using a two-sided Mann-Whitney U test (wilcox.test).Features with the smallest P-values were selected as the first set of top features.Features with the highest log 2 FC and P-value < 0.001 were also selected as the second set of top features.Ten times repeated five-fold cross-validated (rfeControl) was applied to get the third set of top features.Seven machine-learning algorithms were applied to the top features in the biomarker identification, including generalized linear models, random forest, k-nearest neighbor classification, naïve bayes, ridge, lasso, and elastic-netregularized generalized linear model.Receiver operating characteristic (ROC) analysis with leave-one-out cross-validation was used to evaluate model performance with the use of 'pROC' package (v.1.18.0).Models with the highest area under the ROC were chosen and were fit to the entire dataset to get the final model.Machine-learning algorithms were performed using the caret package (v.6.0.91) in R (v.3.6.1).

Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Fig. 1 |
Fig. 1 | Digital rectal exam enriches for prostate tissue-derived proteins.a Matched pre-and post-DRE urine proteomes from clinical International Society of Urological Pathology (cISUP) Grade Group (GG) 1 patients consisting of unfractionated urine (soluble proteins, uSP) and two subtypes of urinary extracellular vesicles (uEV) isolated by differential ultracentrifugation at 20,000 × g (uEV-P20) and 150,000 × g (uEV-P150).b Transmission electron microscopy images of uEVs isolated from pre-and post-DRE urine from a single individual.Scale bar: 200 nm.Images are representative of three biological replicates.c-e Proteomic differences in pre-and post-DRE urines in uSP (c), uEV-P20 (d) and uEV-P150 (e) fractions.Top panel: log 2 fold change (log 2 FC) in protein abundance in urine, grouped by protein detection in prostate tissues18,19 .Bonferroni-corrected P-values from two-sided Mann-Whitney U tests.Bottom panel: log 2 FC in protein abundances.Prostatespecific proteins as per the Human Protein Atlas are in black.f Differences in pre-vs.post-DRE urine (log 2 FC) for select prostate tissue-specific proteins20 .Percentage of 157 prostate cancer (PCa) tissues18,19 in which each protein was detected on the left.Background shading denotes P-value < 0.05 from a two-sided Wilcoxon signedrank test.Source data are provided as a Source Data file.

Fig. 4 |
Fig. 4 | Prostate cancer cell line EVs do not fully reflect prostate fluid EVs. a Overview of cell line EV (cEV) isolation from conditioned media.b Overlap in proteins quantified between sample types.c-e Spearman's correlation between log 2 mean protein abundance in patient and cell line fractions.Cell line protein abundances are the mean of three experimental replicates from five cell lines.flog 2 FC between EV-P150 and EV-P20 in uEV (x-axis) vs. cEV (y-axis).Significantly differentially abundant proteins in both uEV and cEV (two-sided Mann-Whitney U test FDR < 0.05) are pink (more abundant in EV-P150) or green (more abundant in EV-P20).Grey dots represent non-significant proteins (n.s., FDR ≥ 0.05).g Organellar enrichment for proteins (two-proportion z-test) in each quadrant from (d) (left panel), annotated with differences in fraction (middle panel) between EV-P150 (top right quadrant) and EV-P20 (bottom left quadrant) and P-value (right panel).Source data are provided as a Source Data file.

Fig. 7 |
Fig.7| EV cargo is context-dependent.a Proteins differentially abundant in each condition and urine fraction by compartment.b Differences in protein abundance of extracellular vesicle markers from ref.41 in matched uEV-P20 (green) and uEV-P150 (pink) relative to uSP.n = 96 patients.Boxplots are shown with the line indicating the sample median, the box indicating the 25th and 75th percentiles, and the whiskers indicating ±1.5 × IQR.c Fraction-specific proteins were determined by differential protein abundance and frequency of detection (Fig.2j).Proteins in each subset were filtered based on the following criteria.Tumor markers: |log 2 FC | PCa/ 42Proteins in each subset were filtered based on the following criteria.Tumor markers: |log 2 FC | PCa/ NC > 1 and FDR PCa/NC < 0.05; Grade markers: |log 2 FC | cISUP>1/cISUP=1 > 1 and unadjusted P cISUP>1/cISUP=1 < 0.05; Core markers: |log 2 FC | PCa/NC < 1 and FDR PCa/NC ≥ 0.05 and |log 2 FC | cISUP>1/cISUP=1 > 1 and unadjusted P cISUP>1/cISUP=1 ≥ 0.05 and >90% of samples and <25% least variant proteins.Tumor, grade, and core markers were further filtered to select cell surface markers (SPC 42 >2).PCa: Prostate cancer; NC: Non-cancer; SPC: Surface prediction consensus42.P-values from a two-sided Mann-Whitney U test.d Tumor markers with predicted cell surface localization from (c).Proteins are annotated with the frequency of detection in each fraction (left panel) and differential abundance in PCa vs. NC in uEV (middle panel) and cEV (right panel).Non-cancer (NC) cEVs from RWPE1 cell line.Source data are provided as a Source Data file.
3.1 build 3.1.46).One technical replicate was captured per sample.For each replicate, three 30 s videos were captured, with approximately 20-200 particles in the field of view for each measurement.Ambient temperature was set at 22 °C.
196 men with prostate cancer; 76 tumor samples) and Khoo et al.19(40 menwith Prostate cancer; 81 samples [41 tumor and 40 NAT]); 7438 proteins.To determine which proteins had elevated expression in human prostate tissue, the Human Protein Atlas (v.21.1, updated 2022-05-31) Human Tissue-Specific Proteome 20 from the prostate was used to annotate proteins detected in this study.Proteins were included if they belonged to the 'Tissue enriched', 'Group enriched' or Tissue enhanced' categories for the tissue of interest, totaling 127 genes for prostate tissue.